Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
J Chem Inf Model ; 64(6): 1816-1827, 2024 Mar 25.
Artigo em Inglês | MEDLINE | ID: mdl-38438914

RESUMO

In drug discovery, the search for new and effective medications is often hindered by concerns about toxicity. Numerous promising molecules fail to pass the later phases of drug development due to strict toxicity assessments. This challenge significantly increases the cost, time, and human effort needed to discover new therapeutic molecules. Additionally, a considerable number of drugs already on the market have been withdrawn or re-evaluated because of their unwanted side effects. Among the various types of toxicity, drug-induced heart damage is a severe adverse effect commonly associated with several medications, especially those used in cancer treatments. Although a number of computational approaches have been proposed to identify the cardiotoxicity of molecules, the performance and interpretability of the existing approaches are limited. In our study, we proposed a more effective computational framework to predict the cardiotoxicity of molecules using an attention-based graph neural network. Experimental results indicated that the proposed framework outperformed the other methods. The stability of the model was also confirmed by our experiments. To assist researchers in evaluating the cardiotoxicity of molecules, we have developed an easy-to-use online web server that incorporates our model.


Assuntos
Cardiotoxicidade , Desenvolvimento de Medicamentos , Humanos , Descoberta de Drogas , Coração , Redes Neurais de Computação
2.
Comput Struct Biotechnol J ; 21: 3045-3053, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37273848

RESUMO

N4-methylcytosine (4mC) is one of the most common DNA methylation modifications found in both prokaryotic and eukaryotic genomes. Since the 4mC has various essential biological roles, determining its location helps reveal unexplored physiological and pathological pathways. In this study, we propose an effective computational method called i4mC-GRU using a gated recurrent unit and duplet sequence-embedded features to predict potential 4mC sites in mouse (Mus musculus) genomes. To fairly assess the performance of the model, we compared our method with several state-of-the-art methods using two different benchmark datasets. Our results showed that i4mC-GRU achieved area under the receiver operating characteristic curve values of 0.97 and 0.89 and area under the precision-recall curve values of 0.98 and 0.90 on the first and second benchmark datasets, respectively. Briefly, our method outperformed existing methods in predicting 4mC sites in mouse genomes. Also, we deployed i4mC-GRU as an online web server, supporting users in genomics studies.

3.
Comput Struct Biotechnol J ; 21: 751-757, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36659924

RESUMO

Nowadays, antibiotic resistance has become one of the most concerning problems that directly affects the recovery process of patients. For years, numerous efforts have been made to efficiently use antimicrobial drugs with appropriate doses not only to exterminate microbes but also stringently constrain any chances for bacterial evolution. However, choosing proper antibiotics is not a straightforward and time-effective process because well-defined drugs can only be given to patients after determining microbic taxonomy and evaluating minimum inhibitory concentrations (MICs). Besides conventional methods, numerous computer-aided frameworks have been recently developed using computational advances and public data sources of clinical antimicrobial resistance. In this study, we introduce eMIC-AntiKP, a computational framework specifically designed to predict the MIC values of 20 antibiotics towards Klebsiella pneumoniae. Our prediction models were constructed using convolutional neural networks and k-mer counting-based features. The model for cefepime has the most limited performance with a test 1-tier accuracy of 0.49, while the model for ampicillin has the highest performance with a test 1-tier accuracy of 1.00. Most models have satisfactory performance, with test accuracies ranging from about 0.70-0.90. The significance of eMIC-AntiKP is the effective utilization of computing resources to make it a compact and portable tool for most moderately configured computers. We provide users with two options, including an online web server for basic analysis and an offline package for deeper analysis and technical modification.

4.
Proteomics ; 23(1): e2100134, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-36401584

RESUMO

Nonclassical secreted proteins (NSPs) refer to a group of proteins released into the extracellular environment under the facilitation of different biological transporting pathways apart from the Sec/Tat system. As experimental determination of NSPs is often costly and requires skilled handling techniques, computational approaches are necessary. In this study, we introduce iNSP-GCAAP, a computational prediction framework, to identify NSPs. We propose using global composition of a customized set of amino acid properties to encode sequence data and use the random forest (RF) algorithm for classification. We used the training dataset introduced by Zhang et al. (Bioinformatics, 36(3), 704-712, 2020) to develop our model and test it with the independent test set in the same study. The area under the receiver operating characteristic curve on that test set was 0.9256, which outperformed other state-of-the-art methods using the same datasets. Our framework is also deployed as a user-friendly web-based application to support the research community to predict NSPs.


Assuntos
Aminoácidos , Proteínas , Aminoácidos/metabolismo , Proteínas/química , Software , Biologia Computacional/métodos , Algoritmos
5.
J Chem Inf Model ; 62(21): 5050-5058, 2022 Nov 14.
Artigo em Inglês | MEDLINE | ID: mdl-36373285

RESUMO

Malaria is a threatening disease that has claimed many lives and has a high prevalence rate annually. Through the past decade, there have been many studies to uncover effective antimalarial compounds to combat this disease. Alongside chemically synthesized chemicals, a number of natural compounds have also been proven to be as effective in their antimalarial properties. Besides experimental approaches to investigate antimalarial activities in natural products, computational methods have been developed with satisfactory outcomes obtained. In this study, we propose a novel molecular encoding scheme based on Bidirectional Encoder Representations from Transformers and used our pretrained encoding model called NPBERT with four machine learning algorithms, including k-Nearest Neighbors (k-NN), Support Vector Machines (SVM), eXtreme Gradient Boosting (XGB), and Random Forest (RF), to develop various prediction models to identify antimalarial natural products. The results show that SVM models are the best-performing classifiers, followed by the XGB, k-NN, and RF models. Additionally, comparative analysis between our proposed molecular encoding scheme and existing state-of-the-art methods indicates that NPBERT is more effective compared to the others. Moreover, the deployment of transformers in constructing molecular encoders is not limited to this study but can be utilized for other biomedical applications.


Assuntos
Antimaláricos , Produtos Biológicos , Antimaláricos/farmacologia , Antimaláricos/química , Produtos Biológicos/farmacologia , Máquina de Vetores de Suporte , Aprendizado de Máquina , Algoritmos
6.
BMC Genomics ; 23(Suppl 5): 681, 2022 Oct 03.
Artigo em Inglês | MEDLINE | ID: mdl-36192696

RESUMO

BACKGROUND: Promoters, non-coding DNA sequences located at upstream regions of the transcription start site of genes/gene clusters, are essential regulatory elements for the initiation and regulation of transcriptional processes. Furthermore, identifying promoters in DNA sequences and genomes significantly contributes to discovering entire structures of genes of interest. Therefore, exploration of promoter regions is one of the most imperative topics in molecular genetics and biology. Besides experimental techniques, computational methods have been developed to predict promoters. In this study, we propose iPromoter-Seqvec - an efficient computational model to predict TATA and non-TATA promoters in human and mouse genomes using bidirectional long short-term memory neural networks in combination with sequence-embedded features extracted from input sequences. The promoter and non-promoter sequences were retrieved from the Eukaryotic Promoter database and then were refined to create four benchmark datasets. RESULTS: The area under the receiver operating characteristic curve (AUCROC) and the area under the precision-recall curve (AUCPR) were used as two key metrics to evaluate model performance. Results on independent test sets showed that iPromoter-Seqvec outperformed other state-of-the-art methods with AUCROC values ranging from 0.85 to 0.99 and AUCPR values ranging from 0.86 to 0.99. Models predicting TATA promoters in both species had slightly higher predictive power compared to those predicting non-TATA promoters. With a novel idea of constructing artificial non-promoter sequences based on promoter sequences, our models were able to learn highly specific characteristics discriminating promoters from non-promoters to improve predictive efficiency. CONCLUSIONS: iPromoter-Seqvec is a stable and robust model for predicting both TATA and non-TATA promoters in human and mouse genomes. Our proposed method was also deployed as an online web server with a user-friendly interface to support research communities. Links to our source codes and web server are available at https://github.com/mldlproject/2022-iPromoter-Seqvec .


Assuntos
Memória de Curto Prazo , Software , Animais , Humanos , Camundongos , Regiões Promotoras Genéticas , Sequências Reguladoras de Ácido Nucleico , TATA Box/genética , Sítio de Iniciação de Transcrição , Transcrição Gênica
7.
J Chem Inf Model ; 62(21): 5080-5089, 2022 Nov 14.
Artigo em Inglês | MEDLINE | ID: mdl-35157472

RESUMO

Cancer is one of the most deadly diseases that annually kills millions of people worldwide. The investigation on anticancer medicines has never ceased to seek better and more adaptive agents with fewer side effects. Besides chemically synthetic anticancer compounds, natural products are scientifically proved as a highly potential alternative source for anticancer drug discovery. Along with experimental approaches being used to find anticancer drug candidates, computational approaches have been developed to virtually screen for potential anticancer compounds. In this study, we construct an ensemble computational framework, called iANP-EC, using machine learning approaches incorporated with evolutionary computation. Four learning algorithms (k-NN, SVM, RF, and XGB) and four molecular representation schemes are used to build a set of classifiers, among which the top-four best-performing classifiers are selected to form an ensemble classifier. Particle swarm optimization (PSO) is used to optimise the weights used to combined the four top classifiers. The models are developed by a set of curated 997 compounds which are collected from the NPACT and CancerHSP databases. The results show that iANP-EC is a stable, robust, and effective framework that achieves an AUC-ROC value of 0.9193 and an AUC-PR value of 0.8366. The comparative analysis of molecular substructures between natural anticarcinogens and nonanticarcinogens partially unveils several key substructures that drive anticancerous activities. We also deploy the proposed ensemble model as an online web server with a user-friendly interface to support the research community in identifying natural products with anticancer activities.


Assuntos
Antineoplásicos , Produtos Biológicos , Humanos , Produtos Biológicos/farmacologia , Algoritmos , Aprendizado de Máquina , Bases de Dados Factuais , Antineoplásicos/farmacologia
8.
Artigo em Inglês | MEDLINE | ID: mdl-35192082

RESUMO

There is a growing body of literature supporting the utilization of machine learning (ML) to improve diagnosis and prognosis tools of cardiovascular disease. The current study was to investigate the impact that the ML framework may have on the sensitivity of predicting the presence or absence of congenital heart disease (CHD) using fetal echocardiography. A comprehensive fetal echocardiogram including 2D cardiac chamber quantification, valvar assessments, assessment of great vessel morphology, and Doppler-derived blood flow interrogation was recorded. The postnatal echocardiogram was used to ascertain the diagnosis of CHD. A random forest (RF) algorithm with a nested tenfold cross-validation was used to train models for assessing the presence of CHD. The study population was derived from a database of 3910 singleton fetuses with maternal age of 28.8 ± 5.2 years and gestational age at the time of fetal echocardiography of 22.0 weeks (IQR 21-24). The proportion of CHD was 14.1% for the studied cohort confirmed by post-natal echocardiograms. Our proposed RF-based framework provided a sensitivity of 0.85, a specificity of 0.88, a positive predictive value of 0.55 and a negative predictive value of 0.97 to detect the CHD with the mean of mean ROC curves of 0.94 and the mean of mean PR curves of 0.84. Additionally, six first features, including cardiac axis, peak velocity of blood flow across the pulmonic valve, cardiothoracic ratio, pulmonary valvar annulus diameter, right ventricular end-diastolic diameter, and aortic valvar annulus diameter, are essential features that play crucial roles in adding more predictive values to the model in detecting patients with CHD. ML using RF can provide increased sensitivity in prenatal CHD screening with very good performance. The incorporation of ML algorithms into fetal echocardiography may further standardize the assessment for CHD.

9.
J Chem Inf Model ; 62(21): 5059-5068, 2022 11 14.
Artigo em Inglês | MEDLINE | ID: mdl-34672553

RESUMO

The human cytochrome P450 (CYP) superfamily holds responsibilities for the metabolism of both endogenous and exogenous compounds such as drugs, cellular metabolites, and toxins. The inhibition exerted on the CYP enzymes is closely associated with adverse drug reactions encompassing metabolic failures and induced side effects. In modern drug discovery, identification of potential CYP inhibitors is, therefore, highly essential. Alongside experimental approaches, numerous computational models have been proposed to address this biochemical issue. In this study, we introduce iCYP-MFE, a computational framework for virtual screening on CYP inhibitors toward 1A2, 2C9, 2C19, 2D6, and 3A4 isoforms. iCYP-MFE contains a set of five robust, stable, and effective prediction models developed using multitask learning incorporated with molecular fingerprint-embedded features. The results show that multitask learning can remarkably leverage useful information from related tasks to promote global performance. Comparative analysis indicates that iCYP-MFE achieves three predominant tasks, one equivalent task, and one less effective task compared to state-of-the-art methods. The area under the receiver operating characteristic curve (AUC-ROC) and the area under the precision-recall curve (AUC-PR) were two decisive metrics used for model evaluation. The prediction task for CYP2D6-inhibition achieves the highest AUC-ROC value of 0.93 while the prediction task for CYP1A2-inhibition obtains the highest AUC-PR value of 0.92. The substructural analysis preliminarily explains the nature of the CYP-inhibitory activity of compounds. An online web server for iCYP-MFE with a user-friendly interface was also deployed to support scientific communities in identifying CYP inhibitors.


Assuntos
Inibidores das Enzimas do Citocromo P-450 , Sistema Enzimático do Citocromo P-450 , Humanos , Inibidores das Enzimas do Citocromo P-450/farmacologia , Inibidores das Enzimas do Citocromo P-450/metabolismo , Sistema Enzimático do Citocromo P-450/metabolismo , Citocromo P-450 CYP2D6 , Área Sob a Curva , Microssomos Hepáticos/metabolismo
10.
ACS Omega ; 5(39): 25432-25439, 2020 Oct 06.
Artigo em Inglês | MEDLINE | ID: mdl-33043223

RESUMO

As a critical issue in drug development and postmarketing safety surveillance, drug-induced liver injury (DILI) leads to failures in clinical trials as well as retractions of on-market approved drugs. Therefore, it is important to identify DILI compounds in the early-stages through in silico and in vivo studies. It is difficult using conventional safety testing methods, since the predictive power of most of the existing frameworks is insufficiently effective to address this pharmacological issue. In our study, we employ a natural language processing (NLP) inspired computational framework using convolutional neural networks and molecular fingerprint-embedded features. Our development set and independent test set have 1597 and 322 compounds, respectively. These samples were collected from previous studies and matched with established chemical databases for structural validity. Our study comes up with an average accuracy of 0.89, Matthews's correlation coefficient (MCC) of 0.80, and an AUC of 0.96. Our results show a significant improvement in the AUC values compared to the recent best model with a boost of 6.67%, from 0.90 to 0.96. Also, based on our findings, molecular fingerprint-embedded featurizer is an effective molecular representation for future biological and biochemical studies besides the application of classic molecular fingerprints.

11.
J Chem Inf Model ; 60(3): 1101-1110, 2020 03 23.
Artigo em Inglês | MEDLINE | ID: mdl-31873010

RESUMO

Traditional herbal medicine has been an inseparable part of the traditional medical science in many countries throughout history. Nowadays, the popularity of using herbal medicines in daily life, as well as clinical practices, has gradually expanded to numerous Western countries with positive impacts and acceptance. The continuous growth of the herbal consumption market has promoted standardization and modernization of herbal-derived products with present pharmacological criteria. To store and extensively share this knowledge with the community and serve scientific research, various herbal metabolite databases have been developed with diverse focuses under the support of modern advances. The advent of these databases has contributed to accelerating research on pharmaceuticals of natural origins. In the scope of this study, we critically review 30 herbal metabolite databases, discuss different related perspectives, and provide a comparative analysis of 18 accessible noncommercial ones. We hope to provide you with fundamental information and multidimensional perspectives from herbal medicines to modern drug discovery.


Assuntos
Descoberta de Drogas , Plantas Medicinais , Bases de Dados Factuais , Medicina Herbária , Medicina Tradicional
12.
BMC Genomics ; 20(Suppl 9): 951, 2019 Dec 24.
Artigo em Inglês | MEDLINE | ID: mdl-31874637

RESUMO

BACKGROUND: Enhancers are non-coding DNA fragments which are crucial in gene regulation (e.g. transcription and translation). Having high locational variation and free scattering in 98% of non-encoding genomes, enhancer identification is, therefore, more complicated than other genetic factors. To address this biological issue, several in silico studies have been done to identify and classify enhancer sequences among a myriad of DNA sequences using computational advances. Although recent studies have come up with improved performance, shortfalls in these learning models still remain. To overcome limitations of existing learning models, we introduce iEnhancer-ECNN, an efficient prediction framework using one-hot encoding and k-mers for data transformation and ensembles of convolutional neural networks for model construction, to identify enhancers and classify their strength. The benchmark dataset from Liu et al.'s study was used to develop and evaluate the ensemble models. A comparative analysis between iEnhancer-ECNN and existing state-of-the-art methods was done to fairly assess the model performance. RESULTS: Our experimental results demonstrates that iEnhancer-ECNN has better performance compared to other state-of-the-art methods using the same dataset. The accuracy of the ensemble model for enhancer identification (layer 1) and enhancer classification (layer 2) are 0.769 and 0.678, respectively. Compared to other related studies, improvements in the Area Under the Receiver Operating Characteristic Curve (AUC), sensitivity, and Matthews's correlation coefficient (MCC) of our models are remarkable, especially for the model of layer 2 with about 11.0%, 46.5%, and 65.0%, respectively. CONCLUSIONS: iEnhancer-ECNN outperforms other previously proposed methods with significant improvement in most of the evaluation metrics. Strong growths in the MCC of both layers are highly meaningful in assuring the stability of our models.


Assuntos
Elementos Facilitadores Genéticos , Redes Neurais de Computação , Análise de Sequência de DNA/métodos
13.
BMC Bioinformatics ; 20(Suppl 23): 634, 2019 Dec 27.
Artigo em Inglês | MEDLINE | ID: mdl-31881828

RESUMO

BACKGROUND: Since protein-DNA interactions are highly essential to diverse biological events, accurately positioning the location of the DNA-binding residues is necessary. This biological issue, however, is currently a challenging task in the age of post-genomic where data on protein sequences have expanded very fast. In this study, we propose iProDNA-CapsNet - a new prediction model identifying protein-DNA binding residues using an ensemble of capsule neural networks (CapsNets) on position specific scoring matrix (PSMM) profiles. The use of CapsNets promises an innovative approach to determine the location of DNA-binding residues. In this study, the benchmark datasets introduced by Hu et al. (2017), i.e., PDNA-543 and PDNA-TEST, were used to train and evaluate the model, respectively. To fairly assess the model performance, comparative analysis between iProDNA-CapsNet and existing state-of-the-art methods was done. RESULTS: Under the decision threshold corresponding to false positive rate (FPR) ≈ 5%, the accuracy, sensitivity, precision, and Matthews's correlation coefficient (MCC) of our model is increased by about 2.0%, 2.0%, 14.0%, and 5.0% with respect to TargetDNA (Hu et al., 2017) and 1.0%, 75.0%, 45.0%, and 77.0% with respect to BindN+ (Wang et al., 2010), respectively. With regards to other methods not reporting their threshold settings, iProDNA-CapsNet also shows a significant improvement in performance based on most of the evaluation metrics. Even with different patterns of change among the models, iProDNA-CapsNets remains to be the best model having top performance in most of the metrics, especially MCC which is boosted from about 8.0% to 220.0%. CONCLUSIONS: According to all evaluation metrics under various decision thresholds, iProDNA-CapsNet shows better performance compared to the two current best models (BindN and TargetDNA). Our proposed approach also shows that CapsNet can potentially be used and adopted in other biological applications.


Assuntos
Aminoácidos/química , Proteínas de Ligação a DNA/metabolismo , Redes Neurais de Computação , Software , Algoritmos , Sequência de Aminoácidos , DNA/química , Humanos , Matrizes de Pontuação de Posição Específica , Curva ROC , Reprodutibilidade dos Testes
14.
J Chem Inf Model ; 59(1): 1-9, 2019 01 28.
Artigo em Inglês | MEDLINE | ID: mdl-30407009

RESUMO

Vietnam carries a highly diverse practice of traditional medicine in which various combinations of herbs have been widely used as remedies for many types of diseases. Poor hand-written records and current text-based databases, however, perplex the process of conventionalizing and evaluating canonical therapeutic effects. In efforts to reorganize the valuable information, we provide the VIETHERB database ( http://vietherb.com.vn/ ) for herbs documented in Vietnamese traditional medicines. This database is constructed with confidence to provide users with information on herbs and other side information including metabolites, diseases, morphologies, and geographical locations for each individual species. Our data in this release consist of 2,881 species, 10,887 metabolites, 458 geographical locations, and 8,046 therapeutic effects. The numbers of species-metabolite, species-therapeutic effect, species-morphology, and species-distribution binary relationships are 17,602, 2,718, 11,943, and 16,089, respectively. The information on Vietnamese herbal species can be easily accessed or queried using their scientific names. Searching for species sharing side information can be simply done by clicking on the data. The database primarily serves as an open source facilitating users in studies of modernizing traditional medicine, computer-aided drug design, conservation of endangered plants, and other relevant experimental sciences.


Assuntos
Bases de Dados Factuais , Plantas Medicinais , Humanos , Vietnã
15.
BMC Genomics ; 20(Suppl 10): 971, 2019 Dec 30.
Artigo em Inglês | MEDLINE | ID: mdl-31888464

RESUMO

BACKGROUND: Pseudouridine modification is most commonly found among various kinds of RNA modification occurred in both prokaryotes and eukaryotes. This biochemical event has been proved to occur in multiple types of RNAs, including rRNA, mRNA, tRNA, and nuclear/nucleolar RNA. Hence, gaining a holistic understanding of pseudouridine modification can contribute to the development of drug discovery and gene therapies. Although some laboratory techniques have come up with moderately good outcomes in pseudouridine identification, they are costly and required skilled work experience. We propose iPseU-NCP - an efficient computational framework to predict pseudouridine sites using the Random Forest (RF) algorithm combined with nucleotide chemical properties (NCP) generated from RNA sequences. The benchmark dataset collected from Chen et al. (2016) was used to develop iPseU-NCP and fairly compare its performances with other methods. RESULTS: Under the same experimental settings, comparing with three state-of-the-art methods including iPseU-CNN, PseUI, and iRNA-PseU, the Matthew's correlation coefficient (MCC) of our model increased by about 20.0%, 55.0%, and 109.0% when tested on the H. sapiens (H_200) dataset and by about 6.5%, 35.0%, and 150.0% when tested on the S. cerevisiae (S_200) dataset, respectively. This significant growth in MCC is very important since it ensures the stability and performance of our model. With those two independent test datasets, our model also presented higher accuracy with a success rate boosted by 7.0%, 13.0%, and 20.0% and 2.0%, 9.5%, and 25.0% when compared to iPseU-CNN, PseUI, and iRNA-PseU, respectively. For majority of other evaluation metrics, iPseU-NCP demonstrated superior performance as well. CONCLUSIONS: iPseU-NCP combining the RF and NPC-encoded features showed better performances than other existing state-of-the-art methods in the identification of pseudouridine sites. This also shows an optimistic view in addressing biological issues related to human diseases.


Assuntos
Biologia Computacional/métodos , Pseudouridina/metabolismo , RNA/metabolismo , RNA/genética , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...